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o : 

Abstract 

(N ■ 

\ The maximum entropy ansatz, as it is often invoked in the context of time-series analysis, suggests the selection 

of a power spectrum which is consistent with autocorrelation data and corresponds to a random process least 
predictable from past observations. We introduce and compare a class of spectra with the property that the underlying 
random process is least predictable at any given point from the complete set of past and future observations. In this 
context, randomness is quantified by the size of the corresponding smoothing error and deterministic processes are 

■ characterized by integrability of the inverse of their power spectral densities — as opposed to the log-integrability in 
|*V^ the classical setting. The power spectrum which is consistent with a partial autocorrelation sequence and corresponds 
p I ■ to the most random process in this new sense, is no longer rational but generated by finitely many fractional-poles. 

4\ 

?h ! Index Terms 

s 

i— i. Entropy rate, randomness, time-arrow, predictability, smoothing. 

7— I ■ 

> ! I. Introduction 
oo 

Tj - ■ r ■ i HERE is a special place reserved in the spectral analysis literature for the maximum entropy ansatz, 
M and rightly so, due to the multitude of analytic, computational, and practical qualities of maximum 
q ; entropy spectra. The relevant theory is firmly rooted in analytic interpolation, the moment problem, and 

■ the Hilbert space geometry of random processes. The maximum entropy (ME) ansatz, in its basic form, 
^ , calls for selecting the unique power spectrum which is consistent with a known finite set of autocorrelation 
^ [ moments and is the maximizer of a convex logarithmic functional which represents the entropy rate of the 

^ ' underlying random process. A closely related alternative justification relies on the fact that this maximum 
G entropy process (ME process) is the least predictable from past observations and hence, it represents a 

> ! worst-case situation. 

The entropy rate of a random process is an inherently time-dependent concept. This fact becomes 
$_i ' apparent in multivariable prediction theory where the variance of optimal least- variance predictors depends 
. 9^. on the choice of the time-arrow [6, Remark 3]. It is our contention that often there is no natural direction 
of time. This is the case when statistics are obtained from an array of sensors and the index of the 
autocorrelation moments represents spacial separation. It is the case when we consider sparse records 
with both, past and future data available but with possible gaps. It is also the case, when we want to 
estimate the power spectrum and have no plans to use it for prediction in one way or another. In all such 
cases the rationale of the ME ansatz may be called into question. Hence, the purpose of this work is to 
study a time-arrow independent counterpart. In this, a power spectrum is selected so that the underlying 
random process evaluated at any point in time is the least predictable from the complete set of all other 
past and future values. In other words, it is the variance of the optimal smoothing filter which is sought 
to be maximal, as opposed to the variance of the optimal (time-arrow dependent) predictor. 

Power spectra which are consistent with a finite set of (contiguous) autocorrelation statistics and 
correspond to a worst-case smoothing error for the relevant random process, turn out to have an all- 
pole representation, very much like the ones that result in from the ME ansatz but with one important 
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difference. These spectra are inverses of the square root of positive trigonometric polynomials, and 
hence, their poles are fractional. They also share a similar property with ME spectra in that they are 
extrema of a corresponding convex functional — which, however, is no longer logarithmic. Computation 
of their respective parameters is slightly more involved than having to solve linear (Yule-Walker-Levinson) 
equations. They can be computed as fixed points of suitable differential equations originating from 
a homotopy-based method in determining functional extrema. For convenience, and lacking a better 
terminology, we refer to this new class of spectra and the respective processes as most random (MR). 

The maximum entropy ansatz has a fifty year history or more. We will not attempt to overview significant 
milestones but refer to [13] for textbook exposition of relevant material, to [9], [11] for an overview of 
relevant research in signal processing, to Burg [2] who is credited with introducing the maximum entropy 
ansatz in time series analysis, and to Jaynes [10] and Csiszar [3] for systematic analyses of the ansatz 
and its relevance in scientific modeling. 



II. Development and main results 



As explained in the introduction, we consider the problem of spectral analysis based on partial auto- 
correlation statistics. Thus, we begin with a finite set of autocorrelation samples Rk := £{ueu}_ k }, for 
k = 0, 1,2, ... ,n, of a zero-mean, stationary scalar random process {u^ : i E Z}, where "*" denotes 
complex conjugation (together with transposition when applied to vectorial quantities). The discrete "time 
index" may represent a spatial coordinate when the ui's are readings at, say, a number of uniformly and 
linearly spaced sensor locations. 



Without loss of generality we assume that 

Rq 

Ri 



R r , 



R\ 

Rq 



R„ 



p* 



Ra 



>0, 



Rn-l 

i.e., that it is positive definite, for otherwise there is a unique power spectrum dji{6) for which 

1 



Rl 



2tt 



e- jke dfi(9), for k = 0,l, 



n 



(1) 



see e.g., [13], [8]. The following theorem summarizes known facts about the maximum entropy power 
spectrum which is consistent with R ra . 

Theorem 1: Provided R n > there exists a unique power spectrum dfi ME (i.e., a nonnegative 
measure on [-7r, tt)) which satisfies {TJ and is a maximizer of the following convex functional 

fay 



I{dn/dff) := — 
Itx 



log 



\ dO ) 



dO. 



(2) 



Further, dfi ME is absolutely continuous (with respect to the Lebesgue measure) and of the form 

d^uM = fuMde 

where the spectral density / ME (0) is the inverse of a positive trigonometric polynomial of degree at 
most n, i.e., 



k. 



with k 2 > 0, and a(z) 

ME ' v I 

roots in the complement 
plane, in which case 



|a(e^)| 2 

1 + a\z + . . . + a n z n . The polynomial a{z) can be selected to have all of its 



{z : \z\ > 1} of the unit disc D (I 



Otk 



-ctk 




for k < n 
otherwise 



{z : 1 2 1 < 1}) of the complex 

(3) 
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is the (unique) minimizer of the variance 



£dn ME {\u - M |past| 2 } 

of the (one-step-ahead) prediction error when the predictor 



(4) 



is sought to depend linearly on past observations. In general, the minimal variance of the prediction 
error depends on the choice of dfi (which is subject to ©). This variance is maximal when dfi ME is 
selected, i.e., the maximum entropy power spectrum solves the min-max problem: 



In the theorem and throughout, d\ijdd = f denotes the power spectral density function which is 
independent of any possible singular part of the spectral measure dji. The theorem is well known and has 
its roots in the classical theory of moments and the theory of orthogonal polynomials. For a proof see 
[7], [8], cf. [13]. More specifically, the extremal properties of a(z) are established in e.g., [8, page 38], 
see also [7, Chapter VIII]. The fact that / ME is consistent with the autocorrelation moment constraints 
inherited by R n follows from [7, Equations (1.15), (1.18)]. On the other hand, the entropy functional I(-) 
is clearly convex and a variational argument easily shows that the minimizer is of the form indicated. The 
last statement follows from the fact that (see [8, page 38, section 2.2]) 



is independent of dji as long as dU) holds. An alternative derivation of all the claims in the theorem can 
be contructed in a way analogous to the steps used in the proof of Theorem |2] below, which we provide 
in Section Ivj 

The functional !(•) in Theorem [T] can be interpreted to represent entropy rate (see [9]) and has been 
introduced into time- series modeling by Burg [2]. It is also interesting to note that the maximum entropy 
solution dfi ME together with a k s in © represent a saddle point of £^{1^0 — ^ fe>0 a k u_ k \ 2 } seen as 
function of two variables, dfx and the infinite coefficient vector (ai, a 2 , . . . ). 

An alternative choice for a solution to ([TJ corresponding to the least predictable process (MR-process) 
from combined past and future values can be also obtained via convex optimization of a suitable functional. 
The following proposition presents this MR-solution and highlights its justification as the worst-case 
senario with regard to a corresponding smoothing problem. The development mirrors the case of the 
ME- solution. 

Theorem 2: Provided R n > there exists a unique power spectrum dfi MR (nonnegative measure 
on [-7T, 7r]) which satisfies JT) and is a minimizer of the following concave functional 



Further, d/i UR is absolutely continuous (with respect to the Lebesgue measure) and of the form 





is achieved for the choice a k = a k , while 





(5) 
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where the spectral density f MR {0) is the square root of the inverse of a positive trigonometric 
polynomial of degree at most n, i.e., 



with k 2 > and 



MR 



b(e je ) = b^ n e- nje + ... + b + ... + b n e nje > 



for 9 e [-71, 7r] (and := b* k ). The constant k 2 mn can be selected so that the trigonometric 
polynomial b(e je ) satisfies 

2vr J_ w 



in which case, 



, - Pk when 1 < 
Pfc ~ i when k = w 



with p £ the coefficients of the Fourier series of 

y/bj^) = ...+ p_ 2 e~ 2 ^ + p^e- je + 1 + Pl e je + p 2 e 2j9 + ... 
is the (unique) minimizer of the variance 

£d[i MR {\ u ~ Wo| P ast & future) 2 } 

of the smoothing error when the smoothing filter 

W0|past & fut 

:=^2p k u„ k (7) 

fc^o 

is sought to depend linearly on past and future observations. In general, the minimal variance of 
the error depends on the choice of dp (which is subject to {TJ). This variance is maximal when 
dp MR is selected, i.e., the most random power spectrum solves the min-max problem: 

max min <j S dfl {\u - ^ f3 k u_ k \ 2 } : (TJ holds 

The last statement of the theorem echoes the analogous property of the maximum entropy solution. 
In fact, it can be seen that in the present case dp MR , together with the coefficients /3 k s in © for the 
smoothing filter, represent a saddle point of £ dfJi {\u Q — X^o PkU- k \ 2 }. 

The ME-power spectrum is rational and its coefficients can be obtained by solving a system of linear 
equations (the Yule-Walker-Levinson equations) which give rise to the following expression for 



a ^ = T7r5 V det 

det(R„_i) 



/ Ro R-l . . . R-n \ 

(8) 



Ro 


R-i . 


. R. 


Ri 


Ro 


R—n 


Rn-l 


Rn-l 


R- 


z n 


z"- 1 . 


1 



\ Z n 2°"' ... 1 / 



while k 2 iE = det(R„)/ det(R n _i) e.g., see [13] and also [7, page 156]. The corresponding random 
process can then be simulated via a Markovian realization — in fact via an autoregressive model with 
transfer function k ME /a(z) driven by a unit- variance, white-noise input, cf. [13]. 

The case of the MR-power spectrum differs substantially in this respect. The power spectral density 
function is not rational. However, its coefficients can be readily obtained from the data R n using the 
formalism in [4], [5]. This is explained in the following statement. 
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Theorem 3: Let R n > 0, define the column vectors 

Rx := [ .R* . . . R\ Rq R\ . . . R n ]', and 

G(e je ) := [ e^ nd . . . e^ 9 1 e"^ . . . e^ nd )', 

of size in + 1, where ' denotes transposition (without complex conjugation), and let the X(t) e 
(j-(2n+i)xi re p resen t the solution of the differential equation 

m =Mm) -*L-±r G ^ d e) ( 9) 

dt \ 2tt J_^\(t)G(ei ) J 

on [0, oo), where 

M(X(t)) := r G{e^ e ) 1/2 . /2 G(e> 6 )*d0 (10) 

and 

A(0) = A := [ ... 10 ... ]. 

n n 

Then the following hold: 

(i) \{t) tends to a limit A MR g c( n+1 ) xl as t -> oo, 

(ii) X MK G{e je ) > for all 9 G [-vr, vr) and 

1 

date) = d6 satisfies JTJ, (11) 

(iii) A MR is the unique value in c 2n+1 for which (ii) holds. 

III. Notation and preliminaries 

We consider the scalar zero-mean stationary random process {u k , k <E Z} and, as before, we let 
R , Ri, i? 2 , . . . represent its sequence of autocorrelation samples and d/j(9) its power spectrum. We study 
quadratic optimization problems with respect to the usual inner product 

(y , j akUk,/ J biUe)d fi ■= S d ^{(^2 akUk)(^2 ^eue)*} 

k I k I 

= ^2a k R k ^ e b* e (12) 



k,t 

where R- m := R* m . As usual [12], the closure of span{tifc : k G Z}, which we denote by U, can be 
identified with the space L 2 ,dfi of functions which are square integrable with respect to dfi(9) on the unit 
circle with inner product 

f_a(9)(b(9)Yd^9) 

where a{9) = ^ k ak& ke and b(9) = Ylii^e.^ 6 '. Then it is quite standard that the correspondence 

k 



is a Hilbert space isomorphism. 

Least-variance approximation problems can equivalently be expressed in L 2> d fi - I n particular, the variance 
£(2/Lt{|wo — ^o| P ast| 2 } of the one-step-ahead prediction error 

u — %|past 
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with Uo| P ast = J2k>o a kU~k as in ©, can equivalently be expressed in the form 



n-E 



Wli, (13) 



k>0 



and similarly, the variance of the smoothing error £^{|wo — Wo|past & future) 2 } i s simply 

J '"% (14) 



k^O 



in view of u | pa st & future given in ©. 

The power spectrum dfi is a bounded nonnegative measure on [— it, ir) and admits a decomposition 
<i/i = <i/i s + fdO with c//i s a singular measure and fdO the absolutely continuous part of djj, (with respect 
to the Lebesgue measure). Then, the variance of the optimal one-step-ahead prediction error is given in 
terms of the power spectral density function / by the celebrated Szego-Kolmogorov formula given below. 

Theorem 4: [14] With d/i = d/i s + fdO as above 



inf ||1 - J2 a ^ k9 \\% = ex P \y I' l °gf( 6 ) d9 } 
k>o ^ n ) 



when log / e L u and zero otherwise. 

For a proof see [8, page 183], and also [15, Chapter 6]. In the next section we derive an analogous 
formula for the variance of the optimal smoothing error when using both past and future values of u e . 



IV. Least- variance smoothing 

Given the power spectrum d\i of a random process we seek the optimal linear smoothing filter using 
both past and future observations. It turns out that the variance of the smoothing error is the harmonic 
mean of the spectral density of the random process, i.e., it relates to the 0th Fourier coefficient of the 
inverse of the spectral density of the process. This result will be used in the next section for the purpose 
of identifying the MR-spectra which are consistent with a finite set of autocorrelation samples. 

Theorem 5: Let d\i be a bounded nonnegative measure on [— 7r,7r), let d/j = d/j s + fdO be the 
decomposition of dfi into its singular and absolutely continuous parts. Then, the infimum of 

^ T \a(9)\ 2 dfi(9) (15) 

subject to the constraints 

a{6) G Li, (16) 
i- y W a(6)dd = 1 (17) 

is equal to 

(^Jjm-^y 1 (is) 

when / 1 g Lx, and zero otherwise. 

An important step in the proof of the theorem is provided by the following lemma. 

Lemma 6: [8] Let dfi s be a bounded singular measure on 9 e I := [-7r,7r) (i.e., the absolutely 
continuous part of d/i s is identically zero) and let e a be an arbitrary positive number. Then, it is 
always possible to decompose the interval I into a finite number of intervals such that for a certain 
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class h of these intervals (i.e., their union) and for the complementary class J 2 = I\h, the following 
inequalities hold: 



1 

2t7 



h 



dii s {Q) < e u 
d6 < e\. 



h 



1 

For a proof of Lemma |6] see [8, page 7]. We now proceed with the proof of Theorem |5j 

Proof: of Theorem |5|- Assume first that d[i is absolutely continuous with no singular part. Given any 
positive number e define 



0,(0) := 



(/(*) 



-1 



ir^{f{e) + ey 1 de- 

We note that a e E Li (also in L 2 and in fact, it is even bounded and positive), 

1 

2^ 



a e (6)d9 



and we observe that 



^ J -IT 

li:jm+^r 2 f(Q)d0 

because f(0)/(f(0) + e) < 1. If J" 1 £ Li then 

whereas if G Li the limit equals the expression given in (fT8l) . To prove our claims for the case where 
d/j is absolutely continuous, it remains to show that when f^ 1 E L\ the infimal value for (fl"5l) is never 
strictly less than ( fTSt . 

Continuing on, we assume that f^ 1 E L\. We normalize f^ 1 to have the identity as its Oth Fourier 
coefficient 

f- 1 

and consider the perturbation 

a = cto + 5 

for an arbitrary 5 E L\ with vanishing Oth Fourier coefficient (i.e., a 5 E L\ satisfying j 7 ^ iT 5{6)d6 = so 
that a satisfies (fTTt). It readily follows that 



1 

1 

27 

-L T \ ao (6)\ 2 f(6)d6 + ^ I \5(e)\ 2 f(0),W. (19) 

27T ./_„. 27T 



\a(6)\ 2 dn(6) 
(\a (6)\ 2 + 26(6)a (6) + \6(6)\ 2 )f(9)d9 
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where for the last step we note that 



5(9)a (9)f(9)d9 = I ± ) {e } {e)dd f( 9 ) de 



1 



±Zj-\e)de u 
o. 



5(6)d6 



The first term in ( fl9t is precisely the claimed infimal value in (IT~8l) and the second term is clearly 
nonnegative. This proves our claim in the case where d/j is absolutely continuous. 
We now consider the case where 

dfi(9) = dfi s (9) + f(0)d0 

with dfx s a singular measure (always with respect to the Lebesgue measure). For an arbitrary e > we 
consider a decomposition of 

[-71, 7T) = h U I 2 



where 



J dfi s (6) < e 3 



(20) 



' h 

d9 < e 3 . (21) 

h 

That such a decomposition exists follows from Lemma |6] taking e\ = e 3 in the statement of the lemma. 
Now let Xh denote the characteristic function of I\ which takes the value 1 when 9 E h and zero 
otherwise, and set 

which is in L\ and has the identity as its 0th Fourier coefficient. Then 

^ f \a € {9)\ 2 dii{9) = J- ^ \a e (9)\ 2 d^ s (9) 



The first term on the right hand side is bounded above by 

1/6 2 



i JIM ) + ^xiMde 

which decays to with e, whereas the second term is bounded above by 

1 



^ JIM 9 ) + ^xiMde) 

which in the limit recovers the claimed bound (TT81) . The earlier argument for the case of absolutely 
continuous dfi applies and shows that this bound is in fact the correct value for the infimum and that no 
lower value is possible. ■ 
Remark 7: It is clear from the proof that if f^ 1 £ L\ and dfx = fd6 is absolutely continuous, then 

r 1 



OiQ 
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is the unique optimal solution which achieves the minimal value 
for 

MV-=^ f_ne)\ 2 dm 

subject to a E L\ with Oth Fourier coefficient the identity. Thus, if 

a o (0) ~ • • • + P-ie- je + 1 + pie je + ... 
and pi, t = ±1, ±2, . . . the corresponding Fourier coefficients, then 

is the optimal in the least variance sense estimate for uq, and ui is a random process with dp as its power 
spectrum. In this case the infimum is achieved, and hence it represents the minimum variance of the error. 
When the power spectrum has either singular part or f^ 1 $ Li, then a t as in d22b provides suboptimal 
solutions. This is completely analogous to the Szego-Kolmogorov setting where optimal one-step ahead 
predictors (which use only past observations) exist when log / G L\ otherwise the least variance is not 
attained but can be gotten arbitrarily closely [7, Chapter II]. 

Remark 8: It is interesting to observe that while the minimal variance of a smoothing error for a random 
process having / as spectral density is the harmonic mean 

of the values of / on the [—tt, 7t), the minimal variance of the optimal one-step-ahead predictor using 
only past observations is the geometric mean (see [8, page 183], [15, Chapter 6]) 

m j := e W (J- £ log (f (9)) dO^. 

The former is the inverse of J(dp/dQ) whereas the latter is exponential of —l(dp/d6). Naturally, m-ij < 
m j (see also [1, page 23]). This ordering is clear from the interpretation of the two quantities as variances 
of best predictors which use "past+future" and "only past" observations, respectively. 

V. On deterministic processes: an example 

It may be rather surprising, at first glance, that the value of a random process with power spectral 
density 

f (6) = |l-e je | 2 = 2-2cos(#) (23) 

can be predicted at any given point with arbitrarily small variance, when both past and future observations 
are available. Yet this is the case, and this is due to the fact that f^ 1 L x (equivalently m-ij = 0). This 
example highlights the difference between "deterministic processes" in the sense of = and those 

in the sense of Szego-Kolmogorov which are characterized by m j = or, equivalently, by log / L\ 
instead. 

For our particular example, the fact that f^ 1 £L\ follows from the divergence of 
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as e — > 0. On the other hand, the fact that log/ G G L x can be seen as follows. Since #(V) 
analytic and does not vanish in D, log \g\ is harmonic and 



1 — 2 is 



1 

2^ 



log(|0(re*)|)d0 = log(|0(O)|) = O 



for any value of r G [0, 1). Therefore, the integral of the logarithm of lim r ^! \g(re^ e )\ also vanishes, and 
the same applies to f (Q) = lim r _^i \g(re^ 6 )\ 2 . 

In the rest of this section we explain how a random process corresponding to f a can be predicted with 
vanishingly small variance from the combined past and future record. We do so, for didactic purposes, 
by sketching a specialized and more direct construction than that of Section |IVJ 

Consider a realization {u k , k G Z} of a random process corresponding to /„ as follows: 



Uk = W k - W k _i 



where {w k , k G Z} is a sequence of independent, identically distributed, random variables with zero mean 
and unit variance (i.e., a white-noise process). We assume that "past" ({uk, k < 0}) as well as "future" 
({up., k > 0}) observations are available, and that we wish to estimate the "present" u = w + w^% based 
on this two-sided observation record. Then, 



u 



<0 



U-2 
U-3 



1 -1 

1 








W-3 



and 









' -1 


u >o : = 









U3 








1 

-1 1 

-1 



w 
w 2 



In both cases the mapping is Toeplitz, and identical except for a sign change. Let now 

v:=[l (1-c) (1-e 2 ) ...], 

and for 1 > e > and define 



w-x := vu <0 = w_i 



k+2 



k=-2 
00 



w := -v u >0 =w + e(l - e) ^ e fc 1 w k 
iio := w - 



fe=l 



Each of the above can be taken as an estimator for the corresponding un-hatted variable. The variance of 
estimation in all cases can be made arbitrarily small with appropriately small choice for e. This justifies 
our claim. 
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VI. Proofs of Theorems and 

Due to the strict concavity of the inversion map x i— > l/x on R + , JJ(-) is also a strictly concave 
functional on (non-negative) density functions. We first show that a spectral density / MB of the form 
claimed in Theorem is indeed a minimizer of J(-) subject to the moment constraints 

R k = ^- ! e- jke f(6)d6, for k = 0, 1, . . . , n. (24) 

Existence of suitable values for the corresponding parameters requires proving Theorem |3] next, which 
claims that these values correspond to an attractive equilibrium of a certain differential equation. The form 
of / ME ensures stationarity and hence, due to the strict concavity of JJ(-), it ensures that this is indeed the 
unique extremal point. Finally, we revisit the optimizaton problem and consider measures with possible 
singular part. The singular part does not affect the value of J(-), but the fact that a singular part is allowed, 
relaxes the constraint (l24b to (U]). Yet, as we will see, f ME is still the minimizer and, hence, the extremal 
spectral measure dp, cannot have a singular part. In the end, we return to the remaining claims in Theorem 
12 regarding properties of the minimizer. 



A. Functional form of minimizer 

Consider first the problem of minimizing J(/) with / constrained to satisfy (l2"4l . If 

A := [ A_ n ... Ao ... A n ] 

denotes a vector of Lagrange multipliers, the corresponding Lagrangian is 

C(f, A) := 1(f) - A(Rx - ±- ^ G(e> e )f(0)dB) 

where R 1; G are defined in the statement of Theorem If we set the variation 

8L(f, A; 5f) = i- jT (j^- + XG(e j9 ) \ 5f{9)d9 

identically equal to zero for all perturbations 5f (assuming that / > and hence 5f unconstrained), then 
we conclude that 

f(0) = , 1 (25) 

which is the form claimed in Theorem 13 for / ME . Our next step is to prove that, provided R n > 0, there 
always exists such a density function which satisfies (|2"4"1) and that the trigonometric polynomial \G(e> e ) 
is in fact strictly positive. 



B. Proof of Theorem 

We follow the formalism in [5] for solving moment problems. We denote by 9^ the positive cone 

91 := {R : R = — / G(e j6 )dfi(9), where dy. > 0} 

and by ^ the dual cone 

£ := {A : \G(e je ) > for 9 E [-7r,7r]}. 

Both are subsets of Ex C 2n since their "0th" entries R , A G M + while the remaining entries Re, \? G C 
(£ = ±1, ±2, . . . , ±n). Also, both are convex. The interior of !H is denoted by int(9t) and the interior 
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of the dual cone, which consists of all vectors A such that the trigonometric polynomial AG is strictly 
positive on the unit circle, is denoted by & + . The Jacobian ^ °f me mapping 

H : £ + ^int(5K) : — [ G(e je ) r J_^ dO 

between Lagrange vectors and moments is given in (fTDI) and is denoted by M(A). As long as A G & + 
the Jacobian is an invertible matrix. Our goal is to find a value for A so that condition (ii) of Theorem |3] 
holds. We do this as follows. 

We begin with Aq as in Theorem |3] for which we readily observe that \ G = 1 > 0. It follows that 



R : = _L f G(e je ) r ^^ =dQ G int(SH). 



Since R n > 0, we also know that R4 G int(9l). Since int(*K) is convex and R 1( R G mt(9\), the interval 
[Ro, Ri] C int(JH), i.e., 

R T := rRi + (1 - r)R (26) 

belongs to int(9l) for all r G [0, 1]. The key idea is now to trace R r by following corresponding values 
for A T in the dual cone. This is not always possible. It depends on the functional form for the sought 
spectral density function /. The critical issue that may prevent such path-following in the dual space is 
whether any A in the boundary of maps onto a point in the interior of ISH. When this happens, there 
are interior points in $H which do not admit the assumed representation. We will see below that this does 
not happen for the functional form l/y/XG and hence, that the plan we have outlined applies. We discuss 
these key steps/facts next. 

The moments R r , r G [0, 1], satisfy the differential equation 

^ = Ri - Ro (27) 

dr 

as follows readily from (I26T) . Then the dual parameters A(r) satisfy 

d\{r 



dr 



M(A)" 1 (R 1 -Ro), (28) 



as long as A(r) remains in the interior of & + — in which case M(A) is invertible being the (inverse of 
the) autocorrelation matrix of a positive spectral density function. We claim that this is always the case. 
To prove it, assume that the contrary is true and that [0, r ) is a maximal subinterval of [0, 1] for which 
A(t) G ^ + for < r < t . Thus, the family of positive trigonometric polynomials 

{A(T)G(e") : re[0,r )} 

has either a limit point on the boundary of ^ + or it grows unbounded. In either case we will draw a 
contradiction. 

In the first case, there must exist an accumulation point A for which \G{e^ e ) vanishes on the unit circle. 
But then XG(e^ e ), which is a nonnegative trigonometric polynomial, must have a double root at some 
point e j9 °. Therefore 

(29) 



XG(e^ 



which has at least a single pole at e^ , is not integrable. The assertion that the inverse of the square root 
of a nonnegative trigonometric polynomial which vanishes on the circle is not integrable is elementary. It 
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suffices to consider a typical case, such as 1 — cos(0), where , 1 = m . l lnln .. > ^ is clearly not 

Jlf W ^/l-cos(0) %/2|sin(0/2)| \0\ J 

integrable — the general case is similar. The nonintegrability of d29t implies that the family of vectors 



{±- f G{^)^=^==de : 0<r<r } 

is unbounded, in contradiction to the assumption that the image of {A(r) : < r < r } under if is the 
subset 

{R r : < r < r } 

of the bounded interval [Ro,Ri]. 

We now draw a contradiction for the second case. We assume that A(r) grows unbounded as r — ► r . 
It follows that there is sequence r, G [0, r ), i = 1,2,... such that 7* — ► r and ||A(rj)|| — > oo while the 
unit-length vectors 

converge as i — > oo, with || ■ || being the Euclidean norm. At the same time, the sequence R n = H(A(r i )), 
i = 1, 2, . . ., converges to R TQ G int. (SH) . But any interior point R G £H is characterized by the property 
that the functional 

£ R : £ -> M+ : A ^ AR 

is strictly positive (e.g., see [5, Proposition 3]). (This is due to the fact any such R assumes a representation 
2~ fl 7T G( e ' je )f(@)d'9 for some strictly positive density function f(9).) On the other hand, returning to the 
sequence R n i = 1,2,..., we observe that 

£ R : A; H-> XiR T = — r XiG ^ -_dd 

2vr J_ n ^x(n)GW ) 




tends to as ||A(Tj)|| grows unbounded. Therefore, the functionals (£r t ., i — * oo, are not uniformly bounded 
away from zero. Yet, their limit (£ R is, due to the fact that R T0 G int(£H). This is a contradiction. Therefore 
(l28t can be integrated over the complete interval [0, 1] and A(r) remains bounded and in the interior of 
the dual cone (i.e., the trajectory lies in .£+). We identify A(l) = A MR . 

We now re-scale the independent variable in (I27H28I ) by replacing r with t = — log(l — r). We simplify 
notation and denote Rr(t) by Rt and A(t(r)) by A(t). Using ^ = 1 — r and R x — R = t~(Ri — R T ), 
we rewrite d27l) as 

—j^- = Ri — Rt, for t G [0, 00), 

and (|3BI> as 

^ = M(A(t))- 1 (R 1 -R 4 ), (30) 

where, as usual, R t = ^ fl n Gj( FiG ^ e nave now esta blished claims (i) and (ii) of Theorem |3] I.e., 
we have shown that as t — > 00 in d30l the trajectory A(t) converges in and that the limit point A MR is 
such that (HJ) holds. Claim (iii) of the theorem follows from the concavity of J(-). More specifically, the 
functional form of / MR guarantees that it is a minimizer of J(-). There can only be one such minimizer 
since J(-) is strictly concave. 
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C. Proof of Theorem |2] 

Define first the column vector 

g (e jd ) := [ 1 e^ e ... e~^ e ]'. 

Assuming that dfj, = dfi s + fd6 with dfj, s a singular measure and fdO the absolutely continuous part of 
dfi, the minimization of JT(/) subject to (HJ) is equivalent to minimization of J(/) subject to 

R n >^ j\(e^)f(e)g(e^yde. (31) 

The corresponding Lagrangian is now 

C (f, A) ■= (32) 



+trace (a (r h - ±- J* g(e je )f (%(e^)*^)) 



J(/(0))+trace(AR„) 

' 1 * (g^AgieP)) f{9)d9 (33) 



2tt 

The Lagrange multiplier A is a matrix which has a Toeplitz structure. (To see this note that any possible 
component of A which is orthogonal to the subspace of Toeplitz matrices has no effect since it vanishes 
when taking the inner product trace (AT) for any Toeplitz matrix T as done in d32t .) The minimizer / 
would correspond to a measure dfx with a nontrivial singular part only if the equality constraint in dTil 
is not active. For this to be the case, the multiplier 

of f(9) in d33t must vanish at least for some values of 9. However, the correspondence 



A 



— TrAn — Al ... tAt) 
n+x u n 1 1 " 

— Al — n"An • • • oAn— 1 



1 X 1 X 1 X 

shows that in fact C (f, A) = £(/, A), i.e., it is the same Lagrangian as in Section IVl-Al The value 
for the Lagrange multipliers in the latter, as identified in Section IVI-B1 are such that A MR G(e- ?e ') is a 
positive trigonometric polynomial. This polynomial is precisely the multiplier of f{9) in (l33l) and is 
strictly positive for all 9 E [— it, 7r]. Hence, the equality constraint in (l3TT) is active for the extremal / 
of the relaxed problem corresponding to (|3*21 ). Then, the analysis in Section IVl-Al applies. Therefore, the 
minimizer corresponds to an absolutely continuous power spectral distribution dfx MR = f MR (9)d9 which 
is of the form claimed in the theorem. 

We now address the remaining claims in the theorem regarding the variance of the smoothing error for 
the corresponding random process. Given the expression for / MR which is the square root of the inverse 
of a positive trigonometric polynomial, the form of the optimal smoothing filter for the corresponding 
random process is provided by Theorem |5J It is a consequence of the same theorem that the variance 
of the optimal smoothing error £dn u {\ u a " w | pa st & future! 2 } 1S precisely the inverse of the J-functional 
evaluated at / MR , i.e., 

Wmk))" 1 • 

The last part of the theorem is also immediate since 



min <^ £ dfl {\u - y^8 k u_ k \ 2 } : © holds I 



is {J{dji I 'd9))~ l for any spectral measure consistent with ©. But dfj, MR is the unique maximizer of this 
inverse. 
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VII. On spectral analysis: an example 

For illustration purposes, we compare the power spectra / ME and / MR given in Theorems [l] and |2] for 
a basic example. We begin by evaluating the first 4 autocorrelation moments for the following spectral 
density: 

/true(0) = 1 + 1 COs(fl) + 5(6 - ~) + 5(6 + ~). 

5 2 2 

Here, for convenience, we depart slightly from our earlier notation and incorporate the singular part of the 
power spectrum into the "spectral density" as a sum of two Dirac functions — the distributions 5(6 — 6 ) for 
#o = ±~. Thus, the absolutely continuous part of the power spectrum is made up of only the continuous 
portion (l + f cos(0)) dd of ft IU e(6)dd. The corresponding random process consists of a random moving 
average component generated by 

U k A = Wk + \ W k-l 

with Wk a white-noise process with variance 1/(1 + 1/4) (normalized so that £{|u^ A | 2 } = 1), and a 
deterministic sinusoidal component at frequency 6 Q = 1/2 [rad/unit of time]. The first 4 samples of the 
autocorrelation function of 

u k = uf A + 2sin(| + 0) 
(with 4>, say, uniformly distributed on [—it, tt]) can be readily computed and are as follows: 

[ -Ro R±i R±2 R±3 ] 

= [ 3.0000 2.1552 1.0806 0.1415 ] . 

The corresponding Toeplitz matrix R 3 is positive definite, and as a result, there is a nontrivial family of 
power spectra which are consistent with the autocorrelation data -dji(6) = ftme(6)d6 is only of them. 

Figure [T] shows the three particular power spectra that concern us here. First, the "moving-average + 

sinusoids" power spectrum described above is shown with a dashed line ( ). Then, a ME-power 

spectrum which is consistent with R 3 and obtained following the maximum entropy ansatz is shown with 
a dash-dotted line (— • — ). Finally, the MR-power spectrum corresponding to the least smooth process is 
shown with a continuous line ( ). 

All three power spectra shown are consistent with the covariance data. Hence, there is no suggestion 
that one should be preferable. They all describe the same data. A selection can only be based on either 
prior information or a prejudice — this is where an "ansatz" becomes relevant. Had we known that the 
"true" spectrum originates from a moving average component plus a minimal number of sinusoids, we 
could have recovered the exact power spectrum from the covariance data following e.g., [6]. Of course, 
such knowledge is rarely available and one is called to use other insights. Thence, if the power spectrum 
and a model for the process is to be used for prediction purposes, the maximum entropy option is quite 
natural since it represents the relevant "worst-case senario." However, if the model is to be used for filling 
in gaps in records, then the MR-option is the appropriate "worst-case senario." Then, if our goal is to 
simply identify features in the power spectrum, either may be appropriate. 

Using © we determine that 

k 2 

/meW = \l + ai ei 9 + a^ s + a 3 eW\ 2 (34) 



with fc„ = 1.2732, and 



ME 



[ O-l &2 «3 ] 

= [ -0.9026 0.1829 0.1465 ] 
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(35) 



1 + axei 6 + a 2 e 2 ^ + a 3 e 3 ^ 



(36) 



with 




-^±1 ^±2 ^±3 



[ 3.4942 -2.5690 0.9598 -0.1231 ] 



or, equivalently, k = 1.2732 and 



[ di d 2 d 3 ] 

= [ -1.7673 1.1795 -0.1956 ] . 



Here, again, we depart slightly from our earlier notation so as to compare the coefficients more directly 
to the ME-spectral density. The parameters be and p e as in Theorem |2] for / MR and smoothing filter, 
respectively, can be readily determined from the above. 

Figure marks the zero of the moving average component of / true (inside D) along with the location 
of the two spectral lines (on the unit circle) with "o". The poles of the ME-spectrum are marked with a 
"o" and the fractional poles of the MR-spectrum with a 

Figure|3]presents realizations of time-series corresponding to /true, f ME , and / MR . The one corresponding 
to /true is generated by a Markovian moving-average model plus a sinusoidal component with a random 
phase. The time-series corresponding to / ME is generated by a Markovian autoregressive model as usual. 
Finally, the time-series corresponding to / MR is generated by a suitable discretization of the standard 
spectral representation (stochastic integral) 



where dv(9) is a zero-mean white noise process for 9 E [— it, it) such that -jg£{\v(9)\ 2 } = f MR (9), see 
e.g., [8, page 183]. There is not apparent observational feature distinguishing these three realizations, at 
least over the window where they have been drawn, and hence, they are produced here only to satisfy 
curiosity. 



The present study sought to explore the issue of the time-arrow in the context of the maximum entropy 
ansatz. When the index of a random process designates a variable other than time, the principle can be 
called into question. A more abstract version of seeking spectra maximally noncommittal to unavailable 
data, such as gaps in a record, suggests other alternatives, including the one studied herein. 

At the moment, the information theoretic significance of S(dp(9)/d9) is still under consideration. 
However, it is clear that, in the same way that entropy rates relate to a level of "surprise" when tracking 
the forward evolution of a random process, similarly J relates to a situation where we record new values of 
a random process at widely separated gaps of an earlier record. Regarding the significance of MR-spectra 
in time-series analysis, examples similar to the one that we presented here suggest similar qualities to the 
ME-ones (though, admitedly, they are slightly less appealing in terms of their ease of computation). 




VIII. Concluding remarks 
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Fig. 1. Power spectra consistent with Rq, R±i, R±2, R±3- 
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Fig. 2. Poles/zeros of /t ruc , and singularities of / ME , and /, 
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